Yash Technologies | Data Engineer Interview Experience | 4 YoE



Round 1: Technical

✅ Tell me about yourself and any recent projects you have been a part of && Questions related to your projects.

✅ Explain the role of AWS Glue Data Catalog in a Spark job. How do you use it with Spark on AWS?

✅ Write a Spark code snippet to read data from an S3 bucket (in CSV format), filter it based on some condition, and save the result back to S3.

✅ How do you handle data skew in Spark, especially when dealing with large datasets in AWS EMR?

✅ Explain how you would implement incremental data processing using AWS Glue and Spark.

✅ How would you handle schema evolution when using AWS Glue for ETL jobs on data stored in S3 or Redshift?

✅ Explain the difference between AWS Glue DynamicFrame and Spark DataFrame. When would you use each in a data engineering pipeline?

✅ Describe how you would handle data quality checks and validation in an AWS-based data pipeline using Spark.

✅ How would you architect a solution to process streaming data using AWS Kinesis and Spark Structured Streaming?

✅ How do you handle large-scale data processing with AWS Lambda and Spark? What challenges do you face?

✅ How do you ensure fault tolerance and resilience in a Spark job running on AWS EMR?

Round 2: HR

✅ Discussion around my experience and projects, some resume-based questions.

✅ Reason for leaving previous company.

✅ What are you expecting in your next job role?